Libckpt: Transparent Checkpointing under UNIX
نویسندگان
چکیده
Checkpointing is a simple technique for rollback recovery: the state of an executing program is periodically saved to a disk le from which it can be recovered after a failure. hile recent research has developed a collection of powerful techniques for minimizing the overhead of writing checkpoint les, checkpointing remains unavailable to most application developers. In this paper we describe libckpt, a portable checkpointing tool for Unix 1 that implements all applicable performance optimizations which are reported in the literature. hile libckpt can be used in a mode which is almost totally transparent to the programmer, it also supports the incorporation of user directives into the creation of checkpoints. This user-directed checkpointing is an innovation which is unique to our work.
منابع مشابه
Memory Exclusion: Optimizing the Performance of Checkpointing Systems
Checkpointing systems are a convenient way for users to make their programs fault-tolerant by intermittently saving program state to disk, and restoring that state following a failure. The main concern with checkpointing is the overhead that it adds to running time of the program. This paper describes memory exclusion, an important class of optimizations that reduce the overhead of checkpointin...
متن کاملCompiler-Assisted Checkpointing
In this paper we present compiler-assisted checkpointing, a new technique which uses static program analysis to optimize the performance of checkpointing. We achieve this performance gain using libckpt, a checkpointing library which implements memory exclusion in the context of user-directed checkpointing. The correctness of user-directed checkpointing is dependent on program analysis and inser...
متن کاملCheckpointing and Its Applications
This paper describes our experience with the implementation and applications of the Unix checkpointing library libckp, and identifies two concepts that have proven to be the key to making checkpointing a powerful tool. First, including all persistent state, i.e., user files, as part of the process state that can be checkpointed and recovered provides a truly transparent and consistent rollback....
متن کاملDMTCP: Scalable User-Level Transparent Checkpointing for Cluster Computations
As the size of clusters increases, failures are becoming increasingly frequent. Applications must become fault tolerant if they are to run for extended periods of time. We present DMTCP (Distributed MultiThreaded CheckPointing), the first user-level distributed checkpointing package not dependent on a specific message passing library. This contrasts with existing approaches either specific to l...
متن کاملApplication-transparent checkpointing in Mach 3.O/UX
Checkpointing is perhaps the most explored of software based recovery techniques yet it has typically been developed only for special purpose or research oriented operating systems. This paper presents virtual memory checkpointing algorithms that have been designed ,for concurrent Unix applications using a hard disk as the stable storage medium. These algorithms can serve as the checkpointing s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995